Morphosyntactic Parser for Brazilian Portuguese: Methodology for Development and Assessment

نویسندگان

  • Izabel Christine Seara
  • Fernando Santana Pacheco
  • Sandra Ghizoni Kafka
  • Rui Seara
چکیده

In text-to-speech (TTS) systems, an effective morphosyntactic classification is important to improve the prosody of synthesized speech as well as the pronunciation of words subject to vocalic alternation. This research work presents a methodology used for developing and assessing an ad hoc morphosyntactic parser to a TTS system for Brazilian Portuguese. The developed parser is composed of a dictionary and a set of rules structured in four levels. The methodology used for development consisted firstly in the creation of a large annotated dataset and an incremental development of rules for morphosyntactic classification. By using this approach, the achieved accuracy rate of the classification process is of 98.59% for words and 80.66% for sentences in a specific dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Development and Evaluation of a Brazilian Portuguese Discourse Parser

We present in this paper the development process and the evaluation procedure of a Brazilian Portuguese discourse parser called DiZer. Based on Rhetorical Structure Theory, DiZer is a symbolic cue phrase-based analyzer that makes use of discourse templates learned from a corpus of scientific texts to identify and build the discourse structure of texts. DiZer evaluation shows satisfactory result...

متن کامل

‘Minor’ Languages, ‘Broken’ Translations: On Brazilian Reworkings of an Albanian Novel

This essay approaches the challenges of global translation in the 21st century from what might still be considered a somewhat uncommon example: a direct translation of Ismail Kadaré's 1978 novel Prill e thyër (Broken April) from the original Albanian into Brazilian Portuguese in 2001. Not only does it examine and compare lexical elements in the source and target texts and the usage of translato...

متن کامل

Grammatical Annotation of Historical Portuguese: Generating a Corpus-Based Diachronic Dictionary

In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our met...

متن کامل

CoGrOO: a Brazilian-Portuguese Grammar Checker based on the CETENFOLHA Corpus

This paper describes an ongoing Portuguese Language grammar checker project, called CoGrOO1-Corretor Gramatical para OpenOffice (Grammar Checker for OpenOffice), based on CETENFOLHA, a Brazilian Portuguese morphosyntactic annotated Corpus. Two of its features are highlighted: hybrid architecture, mixing rules and statistics; free software project. This project aims at checking grammatical error...

متن کامل

Adaptation of Clinical Evaluation of Language Functions--4th Edition to Brazilian Portuguese.

PURPOSES To translate and adapt the Clinical Evaluation of Language Functions--4th Edition (CELF-4) to Brazilian Portuguese. METHOD One hundred and sixty normal language development school children between the ages of seven and ten, half from public schools and the other half from private schools, both located on the east side of São Paulo. RESULTS CELF-4's translation and adjustment to Bra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009